Approximating Aggregate Queries about Web Pages via Random Walks

نویسندگان

  • Ziv Bar-Yossef
  • Alexander C. Berg
  • Steve Chien
  • Jittat Fakcharoenphol
  • Dror Weitz
چکیده

We present a random walk as an eÆcient and accurate approach to approximating certain aggregate queries about web pages. Our method uses a novel random walk to produce an almost uniformly distributed sample of web pages. The walk traverses a dynamically built regular undirected graph. Queries we have estimated using this method include the coverage of search engines, the proportion of pages belonging to .com and other domains, and the average size of web pages. Strong experimental evidence suggests that our walk produces accurate results quickly using very limited resources.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Measuring Index Quality Using Random Walks on the Web

Recent research has studied how to measure the size of a search engine, in terms of the number of pages indexed. In this paper, we consider a di erent measure for search engines, namely the quality of the pages in a search engine index. We provide a simple, e ective algorithm for approximating the quality of an index by performing a random walk on the Web, and we use this methodology to compare...

متن کامل

Web Communities Identification from Random Walks

We propose a technique for identifying latent Web communities based solely on the hyperlink structure of the WWW, via random walks. Although the topology of the Directed Web Graph encodes important information about the content of individual Web pages, it also reveals useful meta-level information about user communities. Random walk models are capable of propagating local link information throu...

متن کامل

A Comparison of Techniques for Sampling Web Pages

As the World Wide Web is growing rapidly, it is getting increasingly challenging to gather representative information about it. Instead of crawling the web exhaustively one has to resort to other techniques like sampling to determine the properties of the web. A uniform random sample of the web would be useful to determine the percentage of web pages in a specific language, on a topic or in a t...

متن کامل

مدل جدیدی برای جستجوی عبارت بر اساس کمینه جابه‌جایی وزن‌دار

Finding high-quality web pages is one of the most important tasks of search engines. The relevance between the documents found and the query searched depends on the user observation and increases the complexity of ranking algorithms. The other issue is that users often explore just the first 10 to 20 results while millions of pages related to a query may exist. So search engines have to use sui...

متن کامل

www.stacs-conf.org FOREWORD

As the World Wide Web is growing rapidly, it is getting increasingly challeng-ing to gather representative information about it. Instead of crawling the web exhaustivelyone has to resort to other techniques like sampling to determine the properties of the web.A uniform random sample of the web would be useful to determine the percentage of webpages in a specific language, on a t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000